SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features

نویسندگان

چکیده

Real-world datasets are heavily skewed where some classes significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting underrepresented instances. To solve this problem, many variations of synthetic minority oversampling methods (SMOTE) have been proposed balance which deal with continuous features. However, for both nominal and features, SMOTE-NC is only SMOTE-based technique data. paper, we present a novel method, SMOTE-ENC (SMOTE—Encoded Nominal Continuous), in features encoded as numeric values difference between two such reflects amount change association class. Our experiments show that classification models using method offer better prediction than when dataset has number also there categorical target Additionally, our addressed one major limitations algorithm. can be applied on mixed consisting cannot function if all nominal. generalized nominal-only datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SMOTE: Synthetic Minority Over-sampling Technique

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of “normal” examples with only a small percentage of “abnormal” or “interesting” examples. It is also the case that the cost of misclassifying an abnormal (i...

متن کامل

Data Preprocessing for Liver Dataset Using SMOTE

-The class imbalanced problem occurs in various disciplines when one of target classes has a small number of instances compare to other classes. A classifier normally ignores or neglects to detect a minority class due to the small number of class instances. It poses a challenge to any classifier as it becomes hard to learn the minority class samples. Most of the oversampling methods may generat...

متن کامل

SMOTE for Regression

Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extr...

متن کامل

A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the sa...

متن کامل

RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique

The problem of imbalanced data, i.e., when the class labels are unequally distributed, is encountered in many real-life application, e.g., credit scoring, medical diagnostics. Various approaches aimed at dealing with the imbalanced data have been proposed. One of the most well known data pre-processing method is the Synthetic Minority Oversampling Technique (SMOTE). However, SMOTE may generate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied system innovation

سال: 2021

ISSN: ['2571-5577']

DOI: https://doi.org/10.3390/asi4010018